ON THE ERGODICITY OF THE ADAPTIVE METROPOLIS 
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Abstract. This paper describes sufficient conditions to ensure the correct ergodic- 
ity of the Adaptive MetropoHs (AM) algorithm of Haario, Saksman, and Tamminen 
[9|, for target distributions with a non-compact support. The conditions ensuring 
a strong law of large numbers and a central limit theorem require that the tails of 
the target density decay super-exponentially and have regular contours. The result 
is based on the ergodicity of an auxiliary process that is sequentially constrained 
to feasible adaptation sets, and independent estimates of the growth rate of the 
AM chain and the corresponding geometric drift constants. The ergodicity result 
of the constrained process is obtained through a modification of the approach due 
to Andrieu and Moulines 



1. Introduction 

The Markov chain Monte Carlo (MCMC) method, first proposed by [13], is a 
commonly used device for numerical approximation of integrals of the type 

= j f{x)TT{x)dx 

where vr is a probability density function. Intuitively, the method is based on pro- 
ducing a sample (Xj^)'^^-^ of random variables from the distribution tt defines. The 
integral 7r(/) is approximated with the average /„ := Sfc=i fi-^k)- In particular, 
the random variables {Xk)k=i are a realisation of a Markov chain, constructed so that 
the chain has tt as the unique invariant distribution. 

One of the most commonly applied constructions of such a chain in M"' is to let 
Xo = xq with some fixed point xq G M*^, and recursively for n > 1, 
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(1) simulate Yn = Xn-i + Un, where f/„ is an independent random variable dis- 
tributed according to some symmetric proposal distribution q, e.g. a zero- 
mean Gaussian, and 

(2) with probability min{l, 7r(F„)/7r(X„_i)}, the proposal is accepted and X„ = 
Yn] otherwise the proposal is rejected and X„ = X„_ 



This symmetric random-walk Metropolis algorithm is often efficient enough, even 
in a relatively complex and high-dimensional situation, provided that the proposal 
distribution q is selected properly. Finding a good proposal for a particular problem 
can, however, be a difficult task. 

Recently, there has been a number of publications describing different adaptation 
techniques aiming to find a good proposal automatically j§, II, H, [H, [isj; see also the 
review article j^. It has been a common practice to perform trial runs, and determine 
the proposal from the outcome. The recently proposed methods are different in that 
they adapt on-the-fly, continuously during the estimation run. In this paper, we 
focus on the forerunner of these methods, the Adaptive Metropolis (AM) algorithm 
[sl, which is a random- walk Metropolis sampler with a Gaussian proposal qy having 
a covariance v. The proposal covariance v is updated continuously during the run, 
according to the history of the chain. In general, such an adaptation may, if carelessly 
implemented, destroy the correct ergodicity properties, i.e. that In does not converge 



to 7r(/) as n — 7- oo (see, e.g., [15| for an example). For practical considerations of the 
AM algorithm, the reader may consult [lol . [l6| . 

In the original paper presenting the AM algorithm, the first ergodicity result 
for such adaptive algorithms was obtained. More precisely, a strong law of large 
numbers was proved for bounded functionals, when the algorithm is run on a compact 
subset of M'^. After that, several authors have obtained more general conditions 
under which an adaptive MCMC process preserves the correct ergodicity properties. 
Andrieu and Robert ^ established the connection between adaptive MCMC and 
stochastic approximation, and proposed a general framework for adaptation. Atchade 
and Rosenthal jsf developed further the technique of joj. Andrieu and Mouhnes 
[l| made important progress by generalising the Poisson equation and martingale 
approximation techniques to the adaptive setting. They proved the ergodicity and a 
central limit theorem for a class of adaptive MCMC schemes. Roberts and Rosenthal 
(l5| use an interesting approach based on coupling to show a weak law of large 
numbers. However, in the case of AM, all the techniques essentially assume that 
the adapted parameter is constrained to a predefined compact set, or do not present 
concrete verifiable conditions. The only result to overcome this assumption is the 
one by Andrieu and Moulines jlj. Their result, however, requires a modification of 
the algorithm, including additional re-projections back to some fixed compact set. 

This paper describes sufficient conditions under which the AM algorithm preserves 
the correct ergodicity properties, and /„ — )■ vr(/) almost surely as n — )■ oo for any 
function / that is bounded on compact sets and grows at most exponentially as ||a;|| — )■ 



oo. In addition, we prove a central limit theorem, stating that n ^'"^ Y12=i[fi-^' 
7r(/)] converges to a Gaussian random variable in distribution. Our main result 
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(Theorem [T3l) holds for the original AM process (without re-projections) having a 
target distribution supported on R''. Essentially, the target density vr must have 
asymptotically lighter tails than 7r(x) = ce""^"'' for some p > 1, and for large enough 
llsll, the sets = {y E : 7r(?/) > vr(x)} must have uniformly regular contours. 
Our assumptions are very close to the well-known conditions proposed by Jarner 
and Hansen [1^ to ensure the geometric convergence of a (non-adaptive) Metropolis 
process. 

The ergodicity results for the AM process rely on three main contributions. First, 
in Section [21 we describe an adaptive MCMC framework, in which the adaptation 
parameter is constrained at each time to a feasible adaptation set. In Section [31 we 
prove a strong law of large numbers and a central limit theorem for such a process, 
through a modification of the technique of Andrieu and Moulines [H. Second, we 
propose an independent estimate for the growth rate of a process satisfying a general 
drift condition in Section [H Third, in Section [5l we provide an estimate for constants 
of geometric drift for a symmetric random-walk Metropolis process, when the target 
distribution has super-exponentially decaying tails with regular contours. 

The paper is essentially self-contained, and assumes little background knowledge. 
Only the basic martingale theory is needed to follow the argument, with the exception 
of Theorem [^T] by Meyn and Tweedie [14.] , restated in Appendix \M Even though we 
consider only the AM algorithm, our techniques apply also to many other adaptive 
MCMC schemes of similar type. 



We consider an adaptive Markov chain Monte Carlo (MCMC) chain evolving in 
space X X §, where X is the state space of the "MCMC" chain (X„)„>o and the 
adaptation parameter (S'„)„>o evolves in § C S, where § is a separable normed 
vector space. We assume an underlying probability space {Q, J^q,F), and denote the 
expectation with respect to P by E. The natural filtration of the chain is denoted 
with := (J-fc)fc>o C. J^n where J-fc := (j{Xj, Sj : < j < k). We also assume that 
we are given an increasing sequence Kq G Ki C ■ ■ ■ C Kn C S of subsets of the 
adaptation parameter space S. The random variables (X„, S'„)„>o form a stochastic 
chain, starting from S'o = Sq G -K'o C S and Xq = G X, and for > 0, satisfying 
the following recursion. 



where Pg is a transition probability for each s G S, : S x X — )■ § is an adaptation 
function, and (77„)n>i is a decreasing sequence of adaptation step sizes ?7„ G (0,1). 
The functions (T„ : S x S — > § are defined as 



2. General Framework and Notations 



(1) 
(2) 



n+1 ~ PSni^n, ■ ) 

'n+1 = CTn+l {Sn,Tln+lH{Sn, Xn+l)) 




s + s', if S + s' G K, 
s, otherwise. 



n 
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Thus, (T„ ensures that Sn hes in Kn for each n > 0. The recursion ([2]) can also be 
considered as constrained Robbins-Monro stochastic approximation; see [l|, 0] and 
references therein. 

Let \^ : X — 7- [1, oo) be a function. We define a ^-norm of a function / as 

ii.|i 1/(^)1 
WfWv ■= sup——, 
^ex V{x) 

As usual, we denote the integration of a function / with respect to a (signed) mea- 
sure /i as := / f{x)n{dx), and define Pf{x) := / f (y) P {x , dy) for a transition 
probability P. The V^-norm of a signed measure is defined as 

\\fi\\v ■■= sup 

The indicator function of a set A is denoted as 1a{x) and equals one if x G A and zero 
otherwise. In addition, we use the notations aV6 := max{a, b} and aAb := min{a, b}. 

3. Ergodicity of Sequentially Constrained Adaptive MCMC 

This section contains general ergodicity results for a sequentially constrained pro- 
cess defined in Section |2l These results can be seen auxiliary to our results on Adap- 
tive Metropolis in Section [5l but may be applied to other adaptive MCMC methods 
as well. 

Suppose that the adaptation algorithm has the form given in ([T]) and and the 
following assumptions are satisfied for some c > 1 and e > 0. 

(Al) For each s G §, the transition probability Pg has vr as the unique invariant 
distribution. 

(A2) For each n > 1, the following uniform drift and minorisation condition holds 
for all s G Kn 

(3) PsVix) < XnVix) + 6„lc„(a;), Va: G X 

(4) P,(x, A) > S„Us{A), Vx G a, VA C X 

where C„ C X is a subset (a minorisation set), ^ : X — )■ [1, oo) is a drift 
function such that sup^^^^ V{x) < bn, and is a probability measure on X, 
concentrated on C„. Furthermore, the constants A„ G (0, 1) and 6„ G (0, oo) 
are increasing, and 6n G (0, 1] is decreasing with respect to n, and they are 
polynomially bounded so that 

(1 - A„)-^ V V 6„ < cn^ 

(A3) For all n > 1 and any r G (0,1], there is c' = c'(r) > 1 such that for all 
s, s' G Kn, 

\\Psf-Ps'f\\vr<cn^\\f\\vr\s~s'\. 
(A4) There is a /3 G [0, 1/2] such that for all n > 1, s G Kn and x G X 

\H{s,x)\ < cn'V^{x). 
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Theorem 1. Assume hold and let f be a function with WfWya < oo for 

some a G (0, 1 — Assume e < [(1/2) A (1 — a — (3)], where k^, > 1 is an 
independent constant, and that J2'k'=i^'^''''~^Vk < oo. Then, 

1 " n^oo 

(5) —} f(Xk) " 7r(/) almost surely. 

n ^-^ 

k=l 

The proof of Theorem [1] is postponed to the end of this section. We start by 
the foUowing lemma, whose proof is given in Appendix \M It shows that if we have 
polynomially worse bounds for drift and minorisation constants, then the speed of 
geometric convergence can get only polynomially worse. 

Lemma 2. Suppose (J^ holds. Then, one has for r G (0, 1] that for all s G Kn and 
k > 1 

\\P^{x,-)-7r{.)\\vr<V^{x)Ly^ 

with bound 

where ^2 > zs an independent constant, and = C2{c,r) > 1. 

Observe that the statement in Lemma |2] entails that any function < oo is 

integrable with respect to the measures vr and P^{x, ■ ), for all x G X, A; > 1, and 
s G [Jn>oKn. The next three results are modified from Proposition 3, Lemma 5, and 
Proposition 6 of respectively. The first one bounds the regularity of the solutions 
fs of the Poisson equation 

(6) /. - Psfs = fs- n{fs) 
for a polynomially Lipschitz family of functions. 

Definition 3. Suppose V : H ^ [l,oo). Given an increasing sequence of subsets 
Kn C E>, n > 1, we say that a family of functions {fs}ses, with : X — )■ M, zs 
{Kn,V) -polynomially Lipschitz with constants c > l,e > 0, if for all s,s' G Kn, we 
have 

\\fs\\v<cn^ and \\fs-fs'\\v<cn^\s-s'\. 

Proposition 4. Suppose that (j{J^-(j^ hold, and the family of functions {/s}sg§ 
is {Kn, V^) -polynomially Lipschitz with constants (c, e), for some r G (0, 1]. There is 
an independent constant K3 > and a constant C3 = 03(0, c', r) > 1, such that 

(i) The family {Ps/sjses {Kn-, V^) -polynomially Lipschitz with constants (03, /«3e). 

(a) Define, for any s G S, the function 

00 

(7) fs ■■= [Psfs - Afs)] . 

k=0 

Then, fs solves the Poisson equation 1^, and the families {/s}sg§ and {Psfs}s£S are 
{Kn, V^) -polynomially Lipschitz with constants (C3, n^^). In other words, 

(8) \\fs\\vr+\\Psfs\\vr <C3n^'' 

(9) II/. - /.'||y. + \\Psfs - Ps'fAvr < csn^'^ls - s'\. 
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for all s, s' G Kn- 

Proof. Throughout the proof, suppose s,s' E Kn- 
The part (i) follows easily from Lemma |2l since 

WPsfsWvr < \\Psfs-7r{fMvr + Hf,)\ < [c2n^'' + 7riV')]\\fs\\vr 

\\Psfs — Ps'fs'Wv < IK-Ps — Ps')fs\\v + \\Ps'{fs — fs')\\v 



\S — S \. 



< c'n'\\fs\\vr\s - s'\ + dn^'^Wfs - fs'Wv^ < m^'^^+'^l 
Consider then (ii). The estimate ()8]) follows by the definition of and Lemma [2|, 



k=0 k=Q 
Pn 



The above bound clearly applies also to HPs/sHy^, and the convergence implies that 
fs solves ([6]). 

For (|9]), define an auxiliary transition probability by setting n(x, A) := 7r(A) and 
write 

fc-i 

pu - Ps'f = Y.^p's - mPs - Ps')[p;r^-'f - Af)] 

j=0 

since nPg = vr for all s. By Lemma [2] and Assumption (AI3]), we have for all s, s' G Kn 
and j > 



{p^-mPs-Ps')[p;r'-'f-n{f)]\\ 



yr 



<Lnp'jm-Ps')[p;r^-'f-n{f)] 

<Lnpi,c'n^\s-s'\\\P;r'-'f-n{f)\\yr 
<Llpt'c'n^\s-s'\\\f\\y. 

which gives that 

(10) \\P^f - P.Vllv'. < kLlpt''c'n^\s - s'\ WfWy. . 

Write then 

oo oo 

fs -fs' = Y. i^sfs - Ps'fs] - E i^s'ifs' - fs) - rrifs' - fs)] . 
k=0 k=0 
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By Lemma [2] and estimate (fTOl) we have 

/ oo \ / oo 

\\fs-Mvr<Llc'n^\s-s'\lY,kpt']\\fs\\vr+LnlY.Pn]\\fs^^ 



,fe=o 



vA:=0 



'^cn" + — pn) '^cn^~\ \s — s'l 



\s — s 



< [Llc'n^l-pr, 

The same bound apphes, with a similar argument, to Psfs — Ps'fs'- D 

Lemma 5. Assume that holds. Then, for all r G [0, 1], any sequence (a„)„>i 
of positive numbers, and (xq, Sq) G X x Kq, we have that 

(11) E[y^(Xfc)] < c^fc2-\/-(xo) 



(12) 



E 



max (aj1/(Xj))' 

in<.j<.k 



where the constant C4 depends only on c. 

Proof. For {xq, Sq) G X x i^o and A; > 1, we can apply the drift inequality and the 
monotonicity of and bk to obtain 

E [V{Xk)] = E [E [V(Xfc) I J^k-i]] = E [P5,_iV^(Xfc_i)] 

fc-i 

< AfcE [\/(Xfc_i)] + 6fc < ■ ■ ■ < XtV{xo) + bkJ2 K 



(13) 



j=0 



< (1 + 6fc ^ Ai)V(xo) < (1 + c2A;2^)V^(xo) < c^e^V{x,). 

j=0 

This estimate with Jensen's inequality yield for r G [0, 1] that 
E[l^^(Xfc)] < (E [ViXkW < cie'^'V'Xxo). 

Similarly, we have 



E 



max{ajV{Xj)y 

m<j<k 



< E 



max ajV{Xj] 

m<j<k 



□ 



by Jensen's inequality and the estimate ( IT3|) . 

Assume that {/s}se§ is a regular enough family of functions. Consider the following 
decomposition, which is one of the key observations in 

k 

(14) E [fs^X,) - ^(/5,)] = M, + + 
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where (Mfc)fc>i is a martingale with respect to and {J^l^)k>\ and (-R^^'')fc>i are 
"residual" sequences, given by 



R 



(1) 



k 



^k^ '— PsofsoiXo) - PskfSk{Xk)- 



Recall that fg solves the Poisson equation The following proposition controls the 
fluctuations of these terms individually. 

Proposition 6. Assume (J^-(J^ hold, (xq, Sq) G X x Kq, and let {fs}s& be 
{Kn, V°')-polynomially Lipschitz with constants (c, e) for some a G (0, 1 — Then, 
for any p E {1, {a + f3)~^], for all 6 > and C, > a, there is a = c^{c,p,a, (3,^) > 1, 
such that for all n > 1, 



(15) 
(16) 

(17) 



P 



P 



P 



\Mk\ ^ . 
sup — > 

k>n k 



sup 

k>n k.^ 



\^k \ s- 

sup > 5 

k>n 



(oo \V 
^(j Vn)^*-«77,j r("+'^)P(xo) 



whenever e > zs small enough to ensure that := K^,e < [| A (l — ^) A (,^ 
where k^>1 is an independent constant. 



a 



Proof. In this proof, c is a constant that can take different values at each appearance. 
By Proposition m we have that ||/s||yQ + HPs/sHyc, < cs^**^^ for all s G K^. Since 
ap G [0, 1], we can bound the martingale differences dM^ := — M^^i for £ > 1 as 
follows, 



(18) 



EldM.r = E 



< E 



fse^iiXe) - Pse_Jse^i{Xe-i 



\\fs,_,\\v^V''{Xe) + \\Ps,_Js,^,\\v^V''{Xe-,) 
< 2P{c3r'y (E [V'PiXe)] + E [y°P(Xf_i)]) 
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by (fTTl) of Lemma [H For p > 2, we have by Burkholder's and Minkowski inequalities 



EIMJP < c„E 



< Cr, 



^ (E|dM£ 



np/2 



n2/p 



£=1 



p/2 



where the constant Cp depends only on p. For 1 < p < 2, the estimate (jlSll yields by 
Burkholder's inequality 

^(|dM^|P^2/p 



EIMJP < c„E 



< Cp^E|dM^|P < cA;(''3+2°)P^+V"*'(xo). 
£=1 



The two cases combined give that 

(19) E|Mfe|P < ck^^-^+'^o.)P^+ip/2)ylyapi^^^Y 

Now, by Corollary |23] of Birnbaum and Marshall's inequality in Appendix iBl 

m—l 



P 



max — — 

n<k<m h 



> 6 



< 6- 



< 



m 



m 



k=n 
m—l 

'*'E|M„|^' + p fc-P-^E|Mfc| 



k=n 



for all m > n. By letting k^, := /t3 + 3, we have from ([19 



since K^e + (1/2) V (l/p) < 1. Now, ([I5]) follows by 



p 



sup 



> 5 



< c5-P 



K3+2a)pe+{p/2)Vl-p-l 



V^""(xo) 



since we have that pn^e — {p/2) A (p — 1) < 0. 

By Proposition m \\fs — fs'Wv" < Cs^'^^^js — s'| for s,s' G iC^. By construction, 
— Si-il < ri£\H{Se-i, Xi)\, and Assumption (AH]) ensures that \H{Si-i, Xf)\ < 
ctV^iXe), so 

|4(X,) - 4_,(X,)| < 03^3^15, - 5,_i|y"(X,) < C3r3^r/,cr\/"+^(X,). 
Let A; > n. Since £('^3+1)^^-? < (£ V n)(''3+i)f-? for £ < k, we obtain 
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and then by Minkowski inequality and (fTTl) of Lemma [H 



E 



max k-^P\R'j!'^\P 

n<k<m 



< E 



(20) 



< c 



< c 



.e=i 

oo 



J2 V n)(''3+i)^-«77^V("+'')P(X^) 



,(K3+l+2o+2^)e-g 



Finally, consider R^^'' . From Proposition HI we have that \\Ps^.fs^.{Xk)\\v'' < c^k'^^" 
and by f lT2|) of Lemma [5l 



E 



max k-^^\PsJs,{X,)\^ 

n<k<m 



< 4E 



max 

n<k<m 



(^{«3e-0/"y(X;-)) 



ap 



ap 



\k=n J 

since (K3 + 2a)e — — a) < 0. So, we have that 



E 



supfc-*p|i?; 

k>n 



Cp|r{2)| 
k I 



(21) 



< 2PE 

< 2PE 



supfc-^^ i\PsJs,{X,)\^+\PsJs,{Xu)\^ 

k>n ^ 



\PsJso{Xo)\' + sup k~^^\PsJsdXkW 
k>n 

< cn(''^+2°)P'+("~^)P\/°P(xo). 
The estimates f lT6|) and f|T7|) follow by Markov inequality from f l20|) and f l2T]) . □ 
The proof of Theorem [T] follows as a straightforward application of Proposition El 
Proof of TheoremUl Let 5 > 0, and denote 



Bi^^ := loo en-.snpl 

I A;>n rC 



> S 



Since H/Hv" < cxd by assumption, we may consider the family {/s}sg§ with fs = f 
for all s G S. Then, we have by decomposition f|T^ that 



(22) P(5^'^^) < P 



\Mk 
sup— — 

k>n 1^ 



> 



6 



\R 



(i)i 



sup— - 

k>n 



> 



6 



+ P 



\R 



(2) I 



sup— - 

k>n 1^ 



> 



5 



We select p G (1, {a + (3) ^) so that K^e < (1 — 1/p), and let ^ = 1. Then, Proposition 
[6] readily implies that the first and the third term in fl22|) converge to zero as n — )■ 00. 
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For the second term, consider 

oo n oo 

j=l j=l j=n+l 

where the second term converges to zero by assumption, and the first term by Kro- 
necker's lemma. There is an increasing sequence {nk)k>i such that P(5iy^'^) < k-\ 
Denoting B := n^^^ B^nJ''\ the Borel-Cantelh lemma implies that P{B^) = 1, 
and for al\ujeB^,§!) holds. □ 

Finally, we prove a central limit theorem in the lines of Theorem 9], assuming 
one more condition holds, with the same constants c > 1 and e > as (All])-(A|lj). 
(A5) There is a /3 G [0,1/2] such that (AH]) holds, and for all n > 1, x G X and 
s, s' G Kn, 

\H{s,x) - H{s',x)\ < cn'\s- s'\V'^{x). 

Theorem 7. Assume (y'(J^-(y'(5^ hold. Let f be a function with \\f\\ya < oo for some 
a G (0, (1 - /3)/2). Assume e < k^} [1/2 A (1 - 2a - (3)] and ^^^^ k^**''^^^r]k < oo, 
where k** > 1 is an independent constant. Furthermore, assume that Sk converges 
a.s. to some random variable Soo, such that Soo belongs to the interior of Kn for 
some N = N{u!) < oo. Then, 

1 " 

(23) -=5^[/(X,)-vr(/)]^Z 

^ k=i 

in distribution, where Z is a random variable with characteristic function (j)z{t) = 
Ee-i'^^*^ wzth := n{fl^ - {PsJsJ') < oo. 

Proof. Let where is the independent constant of Theorem [TJ Consider 

again the martingale decomposition f|T^ . As in the proof of Theorem [T|, we can 
choose p G (1, (a + (3)^^) so that K,,e < (1 — l/p), and let = 1/2. Proposition E] 
then implies that n~^^'^{Rn^ + Rn^) — )■ almost surely. So it suffices to show that 
n~^^'^Mn Z in distribution. By the central limit theorem for martingales 111 . 
Corollary 3.1], it is sufficient to show that for all e > 0, 

1 " 

(24) -^E[dM2 I J^k-i] and 

^ k=l 

(25) ^ f^E [dM2l^|,^,^|>,^^ I ^ 

k=l 

in probability, where dM^ := Mk — Mk-i- Denote gs{x) := Psf'j{x) — {Psfsf-, and 
notice that 

E[dM2 I :Fk-i]=gs,^AXk-i)- 

In the present setting. Proposition H] yields that the families {/s}se§ and {Psfs\s& are 
(i^'n, l^")-polynomially Lipschitz with constants (c3,K3e), implying that {fs}s& and 
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{{PsfsY}s& are (i^'n, y^°)-polynomially Lipschitz with constants {2cl,2hi3e). Since 
«;* > ^3 V K2, we obtain that {5's}ses is {Kn, l^^")-polynomially Lipschitz with con- 
stants (c, 3«;*e) for some c > 1. We can choose again p G (1, (2a + (3)~^) such that 
3fi;^e < (1 — and apply Proposition |6] to obtain 



1 " 

-^(E[dM,2| - vr(^?5,_ J) -> 



k=l 



almost surely. Since 5*^ — )■ 5*00 almost surely, and Soo is in the interior of K^, there 
is an a.s. finite N' such that Sk G K^i for all k > 1, and 

- vr(^Soo)l < Iks, -^Soolk-vr(y") < d\Sk - So^\ -> 0. 

That is, irigsj ^ T^idSoo), and hence 



^ n— 1 



k=0 



almost surely. This yields (12] 

Consider then fl2^ . Applying Lemma [2^ in Appendix |B| we obtain that 

E [dMfc2l||dA4|>ev^} I J^fe-i] < ^^[fX.^i^k)l{ifg^_^^Xk)i>e^/2} I -^fc-i]- 
It follows for all e, L > and for sufficiently large n > 1 that 

^ n /I " 

(26) - J]E [dM|l{|,M,|>.VH} I ^/^-i] < - E 

fc=i fc=i 

where 

hi'^\x):= [ P,ix,dy)f^{y) sup l||/^,(,)|>i|. 

and where the supremum can be taken with respect to a countable dense subset of 
Kjsi/ to ensure measurability. As before, one checks that for all L > the family 
{/li^^jsgs is {Kn, l^^")-polynomially Lipschitz with constants (c, Sn^e), and hence the 
right hand side of fl26|) converges almost surely to 



4^(4t"^) =4 / fljx) sup l^|;^(,)|>^^7r(x)dx ^ 

by monotone convergence. □ 

Remark 8. Theorem^ assumes that the adaptation parameter Sn converges to some 
finite limit S^o- The convergence of Sn in our sequentially constrained adaptive 
MCMC, in general, is out of the scope of this paper, and might require additional 
conditions on the adaptation mechanism. However, in Section\^ we see that in the 
case of Adaptive Metropolis this can he verified fairly easily. 
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Remark 9. The constraint functions an of our framework can be defined more gen- 
erally by allowing an to have additional dependence on u in a J^n-i-fneasurable (pre- 
dictable) manner. The proofs above do not need to be modified to cover this gener- 
alisation. Indeed, we employ a different definition for an in the proof of the central 
limit theorem for the adaptive Metropolis process in Section O 

4. Bound for the Growth Rate 

In this section, we assume that X is a normed space, and estabhsh a bound for the 
growth rate of the chain (||Xji||)„>i, based on a general drift condition. The bound 
assumes httle structure; one must have a drift function V that grows rapidly enough, 
and that the expected growth of V{Xn) is moderate. 

Proposition 10. Suppose that there zs : X — t- [1, oo) such that the bound 

(27) PsV{x) < V{x) + b 

holds for all {x,s) G X x §, where b < oo is a constant independent of s. Suppose 
also that V grows rapidly enough so that 

(28) >u^ V{x) > r{u) 

for all u > 0, where r : [0, oo) — > [0, oo) is a function growing faster than any 
polynomial, i.e. for any p > there is a c = c{p) < oo such that 

(29) sup — - < c. 

n>i r[u) 

Then, for any e > 0, there is an a.s. finite A = A{uj, e) such that 

\\XJ\ < An'. 



Proof. To start with, fl27j) implies for n > 1 

E [V{Xn)] =E[E [V{Xn) I Tn-l]] = E [Ps,,_,V (Xn-l)] < E [V{Xn^,)] + b 

< ■ ■ ■ < V{xo) + bn < bV{xo)n 

where b := b + 1. Now, with fixed a > 1, we can bound the probability of ||X„|| ever 
exceeding an'^ as follows 

/ IIX II \ ^ 

P I max >a]< VP(||X„|| > an') < VP(\/(X„) > r(an')) 

\ l<n<m J ^ — ^ ^ — ^ 

^ ~ ~ ^ n=l n=l 

^-^ rian'^) ^-^ rian'^) 

n=l ^ ' n=l ^ ' 

n=l 

where we use Markov's inequality, and c = c(3/e) < oo is from the application of 

□ 
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We record the following easy lemma, dealing with a particular choice of V{x), for 
later use in Section [5l 

Lemma 11. Assume that the target density tt is differentiable, bounded, bounded 
away from zero on compact sets, and satisfies the following radial decay condition 

X 

lim sup - — 77 ■ Vlog7r(a;) < 0. 

Then, for V{x) = cv'rc~^^'^{x) , the bound fl28|) applies with a function r(u) := ce"''^ 
for some 7, c > 0, satisfying fl29|) . 



Proof. Let i? > 1 be such that sup||^||>^ Vlog7r(x) < —7 for some 7 > 0. Assume 

y gM.^ and \\y\\ > 2R, and write y = {1 + a)x, where ||x|| = R and a = ^ — 1 > L 
Denote h{x) := log7r(a;), and write 

Ay) 



log 



7r{x) 

We have that 

-1/2 



/i+a 
X ■ \/h{tx)dt < —7a. 



V{y) = Cv7t{x)-'/^ ( 441 > cye^ inf 7r(x)"i/2 > 
\Tr{x) ) \\A\=R 



WvW 



and, since vr is bounded away from zero on {x : < 2i?}, we can select c > such 
that the bound applies to all y G M°'. □ 

5. Ergodicity Result for Adaptive Metropolis 

We start this section by outlining the original Adaptive Metropolis (AM) algorithm 
[9|. The AM chain starts from a point Xq = xq G M*^, and we have an initial covariance 
Eq G where C M'^^'^ stands for the symmetric and positive definite matrices. 
We generate, recursively, for > 0, 

(30) X„+i ~ PeE„(X„, ■) 

t;o, < n < iVfo - 1 

Cov(Xo,...,X„) + /«/, n>N^ 



(31) 



-'n+l 



where 6' > is a parameter, A^^ > 2 is the length of the burn-in, k > is a small 
constant, J is an identity matrix, and Pt,(x, • ) is a Metropolis transition probability 
defined as 



(32) P„{x,A):=\a{x) 



1- / (lA^)g.(y-x)dy 



+ ^(^lA^^g.(y-x)dy 

where the proposal density q„ is the Gaussian density with zero mean and covariance 
V G C^. 
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In this paper, just for notational simplicity (see Remark [T2l) . we consider a slight 
modification of the AM chain. Firstly, we do not consider a burn-in period, i.e. let 
Nb = 0, and let Sq > kJ. Instead of ( l3Tll . we construct S„ recursively for n > 1 as 



(33) S„ = — H — \{Xn — Xn-l){Xn — Xn-l) 

n + 1 n + 1 ^ -' 

where X„ denotes the average of Xq, . . . , X„. 

Remark 12. The original AM process uses the unbiased estimate of the covariance 
matrix. In this case, the recursion formula for when n > Ni, + 2, has the form 

fi — I 1 

(34) S„ = + —— [(X„ - X„_i)(X„ - X„_i)^ + kI] 

This recursion can also he formulated in our framework described in Section [H hy 
simply introducing a sequence of adaptation functions Hn{s,x). Our proof applies 
with obvious changes. However, in the present paper, we prefer (!33|) for simpler 
notations. Also, from a practical point of view, observe that (1331) differs from by 
a factor smaller than whence it is mostly a matter of taste whether to use 

(El or (IHD. 



In the notation of the general adaptive MCMC framework in Section [21 we have 
the state space X := R''. The adaptation parameter Sn = (5'i™'', Sn^) consists of the 
mean and the covariance S^n\ having values in (5^™^ ^i"^) G § := M'^ x C^. The 
space S := X M'^^'^ D S is equipped with the norm |s| := Hs*^"^^! V \\s^'"m where 
we use the Euclidean norm, and the matrix norm := trace^A^ A) , respectively. 
The Metropolis kernel Pg is defined as in ( 132|) . with the definition •= for s G §. 
The adaptation function H is defined for s = {s^''^\ s^^^) as 



H(s, x) 



{x - s(""))(x - s^'"))^ - s^'') + kI 

and the adaptation weights are rjn '■= {n + . 

We now formulate our ergodicity result for the AM chain. 

Theorem 13. Assume n is positive, bounded, bounded from below on compact sets, 
differentiable, and 

X 

(35) lim sup - — -Tp ■ Vlog7r(x) = — oo 

''^"^WxWyr \\x\\ 

for some p > 1. Moreover, assume that tt has regular contours, 

, X V7r(x) 

(36) lim sup - — - ■ < 0. 

''^°°llx|l>r IfII ||V7r(x)|| 

Define V{x) := Cy7r~^/^(x) with cy = (sup^. 7r(x))"'^/^. Then, for any f with 
< OO where < a < 1, 

n 

(37) _^/(X,)^7r(/) 

fc=i 
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almost surely. If, in addition, a < 1/2, 

1 " 

(38) Yl [/(^'^) - ^(/)] ^ ^(0' 

^ k=i 

in distribution, where o"^ G [0, oo) is a constant. 

Remark 14. If the conditions of Theorem [73l are satisfied, the function V{x) grows 
faster than an exponential, and hence (1371) and ( l38l) hold for exponential moments. 
In particular, they hold for power moments, i.e. for f{x) = \\x\\^ for any p > 0, and 
therefore also Sn — )■ (m7r,f7r + i^I) where and are the mean and covariance of 

71. 

The proof of Theorem [13] is postponed to the end of this section. We start by a 
simple lemma bounding the growth rate of the AM chain. 

Lemma 15. If the conditions of Proposition [TOl are satisfied for an AM chain, then 
for any e > 0, there is an a.s. finite A = A{uj, e) such that 

i"'M\<An\ \\S^:^\\<An' 



Proof. Since the AM recursion is a convex combination, this is a straightforward 
corollary of Proposition [TUl □ 

Next, we show that each of the Metropolis kernels used by the AM algorithm satisfy 
a geometric drift condition, and bound the constants of geometric drift. The result 
in Proposition [18] is similar to the results obtained in jl2|, |l7| , with the exception 
that we have a common minorisation set C for all proposal scalings. We start by two 
lemmas. We define B{x,r) := {y G M'^ : \\x — y\\ < r}. 

Lemma 16. Assume E cM.'^ is measurable and A C compact, given as 

A:= {ru:ue S'^,0<r < g{u)} 

where S'^ := {m G M'^ : = 1} is the unit sphere, and g : S"^ ^ [b,oo) is a 
measurable function parameterising the boundary OA, with some b > 0. 

For any e > 0, define := {ru : u G S'^, g{u) < r < g{u) + e}. Then, for all 
e > 0, there is a b = 6(e) G (0, oo) such that for all < e < e and for all A > 3e, it 
holds that 

\EnB,\ < \ {E®B{0,\)) nA\ 

whenever b >b. Above, A (B B := {x + y : x & A, y E B} stands for the Minkowski 
sum. 

Proof. See Figure [1] for an illustration of the situation. Denote by S* := {u E S'^ : 
3r > 0, Mr G -E n B^} the projection of the set E (1 B^ onto S'^. Then we have 
E DB^ C {ru : u G S*, g{u) < r < g{u) + e} and A ^ {ru : u e S* , < r < g{u)}. 
Now, for e < A < g{u), we have 

{{E n B,) © 5(0, X))nA^{ru:ueS\ g{u) - A + e < r < g{u)} =: G, 
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E®B{0,X) 




Figure 1 . Illustration of the boundary estimate. The set A is in hght 
grey, and the set in dark grey. 

for let ru G G, then there is g{u) < r < g{u) + e such that ru & E (1 B^, and we 
can write ru = ru + {r — r)u, where (r — f)u e 5(0, A). Clearly, E © 5(0, A) D 
{E n i?e) © -8(0, A), and we can estimate 

{E®B{0,\)) nA\- \EnB,\ 

rg(u) ra{u)+e 
' S* Jg{u)-2e J g{u) 

= \j 2{g{u)Y - {g{u) - 2eY - {g{u) + eYH'-\du) 

J s* 

where 'H'^~^ stands for the d — 1 dimensional Hausdorff measure. This integral is non- 
negative for all < e < cab, for some constant Cd depending only on the dimension 
d, namely let h(e) :^ (y - 26^ + (y + . The mean value theorem implies that for 
some < e' < e, one has 

h{Q) - h{e) = ed{y - 2e')'^"^ 



y-2e' 



> 



whenever e < cay. 



□ 



Lemma 17. Let f{x) := xe "2 . For any < e < 1/8, the following estimates hold 

X 1 
2f{x + e) — f{x) > — , for all < x < -, and 
8 2 

POO 

/ {[2f{x + e) - f{x)] A 0)dx > -e-"''' 
Jo 

for some constant c > 0. 
Proof. We can write 

2f{x + e) - f{x) = e-"^ \2{x + eje'^^'-^ - x 
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which is positive whenever e 2 > 2/3, holding at least for all < x < x*, with 

^ log(3/2) _e^l_ 
e 2 - 4e' 

Now, X* > 1/2 and we can estimate 

2/(x + e)-/(x)>^xe-^ >| 

for all < X < 1/2. Also, 

/•OO POO 2 

/ ([2/(x + e) - /(x)] A 0)dx > - / xe'^^dx = -e'^''^ 

Jo Jx* 

with c = 1 /32. □ 

Proposition 18. Assume that vr satisfies the conditions in Theorem\l^ and k > 0. 
Then, there exists a compact set C C M"', a probability measure v on C, and a 
constant b G [0, 00) such that for the Metropolis transition probability Py in f l32|) and 
for all V & with all eigenvalues greater than k > 0, it holds that 

(39) PvV{x) < KV{x) + blcix), Vx G X 

(40) P^(x, B) > 5yv{B) \/x e C,\/B CX 

where V{x) := Cy7r~^/^(x) > 1 with cy := (sup^, 7r(x))"'^/^ and the constants A„,5^ G 
(0, 1) satisfy the bound 



b-^ < cdet( 



V 



1/2 



for some constant c > 1. 



Proof. Define the sets '■= {y : 7r(y) > tt{x)} and its complement Rx '■= {y '■ 7r(y) < 
7r(x)}, which are the regions of almost sure acceptance and possible rejection at x, 
respectively. Let i? > 1 be sufficiently large to ensure that for all ||x|| > R, it holds 
that 

X V7r(x) , X , , .. ..„_i 

sup J — ^ ■ < —7 and sup j — ^ ■ ViogTrfx) < — x 

||x||>rIfII l|V7r(x)|| ||x||>ij Fll 

for some 7 > 0. Suppose that the dimension d > 2. Lemma [25] in Appendix [Cl 
implies that for R sufficiently large, we have -8(0, M^^ ||x||) G A^ G B{0, M ||x||) for 
all ||x|| > R with some constant M > 1. Moreover, we can parameterise A^ = {ru : 
u G S'^, < r < g{u)} where S'^ := {m G M'^ : \\u\\ = 1} is the unit sphere, and 
g: S'^^ [M-i ||x|| ,M||x||]. 

Consider ( 15^ . We may compute 

_ P,V{x) 
V{x) " 

(41) 
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In what follows, unless explicitly stated, we assume ||a;|| > M{R + 1). Denote ex '■= 
< 1, where a = (p— 1)/2 > 0. Define '■= {ru : u G S'^, < r < g{u) — ex} C 
Ax and '■= {ru : u G S'^, r > g{u) + e^} C Rx- From pij) . we can estimate 



(42) 



q^{y - x)(ly 



- sup q^{z-x) I W^^dy. 



We estimate the two terms in the right hand side separately, starting from the first. 

Let h{x) := log7r(x). Suppose z G Ax, and write z = {1 — a/\\y\\)y for some 
y G dAx and ex < a < \\y\\. Assume for a moment \\z\\ > R. Then, h is decreasing 
on the line segment from z to y, and we can estimate 



nlz) 'K[z 

< g-<:a:Ury||-ta;J'' ' ^ g 



Hence, in this case, n{x)/iT{z) < 1/4 assuming > R2 for sufficiently large R2 > R. 
If 1 1 2; 1 1 < R, then there is z' such that ||2;'|| = R and the estimate above holds for z'. 
Consequently, 

(43) vr(x) ^ ^7r(zO ^ ^-||.r/(2A^)p-i suP||^||<fi7rH ^ 1 

7r(2;) 7r(2;') 7r(2;) ~ inf ||^„||<r 7r(w) ~ 4 

whenever > R2 by increasing R2 if needed. In conclusion, we have shown that 
for ||a;|| > i?2, it holds that (1 - A/vr(x)/7r(?/)) > 1/2 for all ?/ G Ax. 
By Fubini's theorem, we can write for positive / that 

f{z + x)q^{z)dx = —^^^= / , f{z + x)dzdt 



\/det{v) Jo J{e-^^^" ^^>t} 



A/det(t;) Jo 



f{y)dyue 2 dw 



where = {2tc) '^/^ and Eu ■= {z + x : z'^v ^z < u^}. Consequently, for ||a;|| > R2, 
we can estimate the first term of fH2|) from below by 



\Eu n y4a;| n {Rx \ Rx 







2 4 



we 2 dw 



> - / 2\Eu+a n + a)e-^^ - \Eu n (i?^ \ Rx)\ue-^du 
4 Jo 

> - / 2|(E„©5(0, K^/^a)) ni^|(M + a)e""^ - |E„n5e|ue"^dM 
4 Jo 
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for any a > 0, since simple computation shows that © B{0, = {x + y : X & 

Eu, y G -6(0, K^/^a)} C Eu+a, and as we may write = {ru : u E S^, < r < g{u)} 
where g{u) = g{u) — e^, we obtain that \ Rx C {ru : u G S''^, g{u) < r < 
g{u) + 2e^.} =: B^. We set a = Qn'^/'^ex and apply Lemma [T6] with the choice e = 2ex 
and A = Ge^, 



|E„ni,| \Eun{Rx\Rx) 







poo 


> 








4 


Jo 






r-1/2 


> 








4 


J 1/4 



4 

[Eu © 5(0, 6e,)] n A, 



ue 2 cIm 



2(M + a)e 2 — ue 



du 



^ U ~ —2 

\EunAx\-du-\Ax\e-'''^ 



d — cillxll" 



by Lemma [T71 for sufficiently large ||x|| , and since E^ are increasing with respect to u. 
We have that E1/4 D B{x, k^/'^ /A). If 00, then e^; and also \B{x, k^/"^ /A) fl 

- \B{x, K^/'^/A) r\Ax\ -^0. Moreover, it holds that \B{x, k^/'^/A) fl v4^| > C3 > 
(see the proof of Theorem 4.3 in [l2[)- So, for large enough there is a C4 > so 
that \Eiii n Ax\ > C4. To sum up, by choosing i?3 to be sufficiently large, we obtain 
that the first part of P2|) is at least C5(det(f ))~^/^ for all > -R3, with a C5 > 0. 
Next, we turn to the second term of (142|) . We obtain by polar integration that 




dy= / /-ie5'^(™)-5'^(f(")")dr-H'^-^(dM) 



<c', sup / r''-'e-y^'''"'''dr 

where 'H'^~^ is the d—1 dimensional Hausdorff measure, and c'^ = 'H'^^^{S'^). Denote 
T{w,r) := r'^~^e~4 /lu ^'^^ and let us estimate the latter integral from above by 

/ e-y^'^'^'^'dr sup T{w,r)< / e'^^^'^-^'Mr sup T{w,r) 

< AMP^^ ~^ sup T{w,r). 

r>w+ex, 

for any w > M^^ \\x\\. Suppose first w + ex < r < 2w, then 

T{w,r) < {2wY-'e--^'-'"''" < {2My-^ e-i^'^"""^'"" < Cg 

for any M^^ \\x\\ < w < M \\x\\. For any r > 2w and w > 1, we have 

T{w,r) < r'^-^e--^>'~' < r'^-^e'^s < C7. 

Put together, letting R4 > R3 to be sufficiently large, we obtain that > C8(det(f ))^^/^ 
with Cs = C5/2 for all > R^. 
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To sum up, by setting C = i?(0,i?4), we get that for all v E with eigenvalues 
bounded from below by n, the estimate PjjV{x) < X^V{x) holds for x ^ C with 
:= 1 — Cg det(f )~-'^/^ satisfying (1 — Xy)~'^ < Cg ""^ det(f )-'^/^. For x G C, we have 
by gll) that P^V{x) < 2V{x) < 2 sup^g^ '^"(-2) < & < 00, so ([391) holds. In the 
one-dimensional case, the above estimates can be applied separately for the tails of 
the distribution. 

Finally, set z^(-B) := |C|^^|i?nC|, and consider the minorisation condition fHOj) for 

X eC, 

Pv{x,B)> ! (l^ll^q^[y-x)dy 
JBncK 7r(x)y 

/ (lA^) inf e-^(^-^>"^(^-^)di/ 

JBncK TT{x)Jx,yGC 



Cd 

> ^= 

Vdetl 

> ^--l,diam(C)^ mf,gc7r(^) 



V'det('y) sup^ tx{z) 



Bnc 



So fHOl) holds with 5^ := C9det(f) for some cg > 0. Finally, the claim holds with 

c:=c^^yc^^. □ 

Finally, we are ready to prove the strong law of large numbers for the AM process. 

Proof of Theorem\l^ We start by verifying the strong law of large numbers (1371) . Fix 
t>l and consider first the constrained process (^Xn^ ^ Sn^')n>o which is defined as the 
AM chain, but with the constraint sets Kn'^ defined as Kn^ := {s E S : \s\ < tn'^'}, 
with e' = e/{2d), and e G {0, k~^[{1/2) A (1 — a)]), where is the independent 
constant of Theorem [H 

We check that the assumptions (A[I])-(A|1]) are satisfied by the constrained process 
{Xn \ Sn^)n>o foT all t > 1. The condition (A[T]) is satisfied by construction of the 
Metropolis kernels Pg. Since det(f) < Proposition ITS] ensures that there is a 

compact C C M'^ such that (AI2]) holds. For (AI3]), we refer to [H Lemma 13] stating 
that \\Psf - Ps'fWvr < '^dK-^ WfWvr - s'^"^ I for all s^''), s'^") G with eigenvalues 
bounded from below by k. 

Finally, we check that (AS]) holds for any /3 G (0, 1/2]. Similarly as in [3], we have 
that 

UTjf Ml \H{s,x)\ 
sup \\H{s,x)\\y^ = sup sup 



< + sup sup 



sex. 



V^ix) 



n \M + \\xf + t^n^'' + 2tn'' + 2\\x\\tn'' 

^ " S vH^) 

/~j 2 If' 1 1 '^11 V 1 _ 

< V cxK + 7t n sup „ < cn 
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for any (3 G (0, 1/2] by Lemma ITTl where c = c{t,(3). So, assumption (A|4]) holds for 
any /3 G (0, 1 — a). In particular, we can select /3 so that e < ^""'^[(1/2) A (1 — a — 
Clearly, J2k k'^'^'^Vk < J2k k'^*''~^ < oo, so all the conditions of Theorem [1] are 
satisfied, implying that the strong law of large numbers holds for the constrained 
process {Xn \ Sn'^) for alH > 1. 

Define := {Vn > : 5'„ G iT^*^}. We can construct the constrained processes 
so that they coincide with the original process in B^^\ That is, for to G -B*-*-* we have 
{Xn{uj),Sn{uj)) = {xi'\u:),Sli\u:)) for all n > 0. Lemma [I5] ensures that we have 
P(V?T, > : 5*^ G Kn^) > g{t) where (^(t) — t- 1 as t — )■ oo. As in the proof of Theorem 
[H we can use the Borel-Cantelli lemma to deduce that ( 1371) holds almost surely. 

We finally verify the central limit theorem fl38|) . Define m^r := / X7r{x)dx and 
:= J xx'^7T{x)dx — mT^m^ + kI as the mean and (modified) covariance of the 
distribution tt, which are finite, as observed in Remark [TH In addition, as noted in 
Remark [T2| S'„ St, := {m.„,VT,) almost surely by fl37|) . Therefore, if one denotes 
At := {sup„>Q \Sn\ < t}, then P(At) — > 1 as t — )■ oo. 

Fix t > |s^| V |so|. Define the sets K^n^ := K^^^ := xf^ for all > and let 



s + s', if Sk-i + VkH{Sk-i, Xk) G int i^^ for all 1 < A; < n 
s, otherwise 



where int stands for the interior of K^^\ Define the constrained process {Xn \ Sn'^)n>o 
following the framework of Section [2] with constraint functions an\ Here one observes 
that our constraints an^ correspond to stopping the adaptation at the time of possible 
first exit from the interior of i^*^*^ , whence Remark [9] applies in the present situation. 
With this definition, the assumptions (A[I])-(A|4j) are satisfied with some c = c(t) > 1 
and e = 0, and similarly as above, we obtain for s, s' G Kn'^ 

\H{s,x) - H{s',x)\ 

< II _ s'(™)|| + II _ s'(^)|| 

+ ||(a; - s^"'^){x - s^^^^f - (x - s'^"'^){x - s'^'^^fW 

< [l + 2||x|| +2(||s(")|| V lls'^'")]!)] ||s(") + -s'^'')]! 
<ct{l V s'l 

and then that 

1 V ||a;|| 

\\H(s,x) — H(s' ,x)\\^fi < ctls — s'\ sup < ctls — s'\ 

xm-i VP{x) 

for s, s' G Kn^ establishing (A|5]) for any /3 > 0. 

The process {Xn\ Sn^)n>o coincides with the AM chain (X„, Sn)n>o in ^t, in which 
the adaptation parameters Sn^ converge almost surely to s,r ^ int -ft'*^*^ . In the com- 
plement of At, the parameters Sn converge almost surely to some 5*00 G mtk^'\ We 
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can apply Theorem [7] to deduce that 



^ k=i 

in distribution, where Z^^^ is a random variable with the characteristic function 
(f)2(,t){u) = Ee~2'^t" J where is finite almost surely, and equals to cr^ in At. Let 



Z ~ iV(0, (T^), i.e. <pz{u) = e-a'^ « . For fixed u G M, we have 



dP 



< 



()z(M)[l-P(A)][lV(e^-l)] 



so the characteristic functions 0^(t) converge pointwise to 0^, and hence Z^^^ ^°°> Z 
in distribution. 

Let : M — )■ M be bounded and continuous, and denote the probability measures 
induced by random variables as Hx{A) ■= P(X G A). We can choose a non- decreasing 
sequence (tn)n>i of positive numbers such that t„ — )■ oo and 



Y 

'{tn) 



Since Yn is equal to F„ ''^ ""^^^ X]fc=i[/(^fc) ~ ^(/)] ^in^ "we have that 
l/iy„(¥')-/iv(*")(^)l < [1-P(^tj]sup|^(x)| ^^0. 



We conclude that — /iz(v^)| — )• as n — )• oo, and (138|) holds. □ 

Remark 19. Since e > can he selected arbitrarily small in the proof of Theorem 
[73|, it is only required for f l37|) to hold that the adaptation weights rjn G (0, 1) are 
decreasing and that Ylk^'^'^Vk < oo holds for some e > 0. In particular, one can 
choose rjn '■= {n + 1)"''' for any 7 > 0. 

Remark 20. The condition (!35|) implies the super- exponential decay of the tails of tt 

X 

(44) lim sup - — - ■ Vlog7r(a;) = —00. 



This condition, with the contour regularity condition ( !36|) . are common conditions 
to ensure geometric ergodicity of a random-walk Metropolis algorithm, and many 
standard distributions fulfil them fT^ l. The decay condition ( 135|) is only slightly more 
stringent than (jS]). 
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Appendix A. Proof of Lemma [2] 



We provide a restatement of a part of a theorem by Meyn and Tweedie [ijj before 
proving Lemma [2j For a more recent work on quantitative convergence bounds, we 
refer to [6|. 
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Theorem 21. Suppose that the following drift and minorisation conditions hold 

PV{x) < XV{x) + blc{x), Vx G X 

P{x, A) > 6iy{A), Vx G VA c X 

for constants X < 1, b < oo, and 6 > 0, a set C C X, and a probability measure v on 
C . Moreover, suppose that sup^.g^' V{x) < b. Then, for all k > 1 

\\P^{x, ■) -n{.)\\y <V{x){l + ^)-^p'' 

p — V 

for any p > d = 1 — , for 

M=j^^^^[l-\ + b + b'' + C(b{l-\)b^)] 



defined in terms of 



and the bound 



7 = [46 + 25\b] 

A = (A + 7)/(l + 7)<l 
6 = 6 + 7 < oo 



- 4-52 



55 

Proof, [ii, Theorem 2.3]. □ 

Proof of LemmalM Observe that PsV{x) = E [V(X„+i) | X„ = x, Sn = s], and there- 
fore by Jensen's inequahty, (A|2]) implies for x ^ Cn that 

PsVix) < (PsVixyy < K^-ix). 

We can bound A„ := AJ^ < (1 — c^^n^'^Y < 1 — rc^^n^'^ implying 

(1 - A„)"^ < r-^cn' 

whenever r G (0, 1]. Similarly, for x G one has PsV^{x) < (sup^g^^^ ^(-2) + bnY < 
{2bnY, so by letting 6„ := (2bnY: we obtain the drift inequality 

PsV{x)<~X^V'{x) + klcAx) 

and we can bound 6„ < {2cn'^Y- We have the bound (1 — A„)^^ ^ bn < cn^ with some 
c = c(c, r) > 1. 

Now, we can apply Theorem [511 where we can estimate the constants 



46„ + 25nXnbn 



7n = 5: 

bn = bn + 1n<{c + ai)n^^ < aan^' 

and consequently 

1 — A„ c^^n"^ 

> T- > 

1 + 7„ 1 + ain^^ 1 + ai 



1 - A„ = - — — > " „ > — n-^^ = a^V-^^ 
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Moreover, 



S'n \ 1 - A. 



^« =T. [1 - ^" + ^" + + Cn(^n(l " A„) + 6^,)] 

(1 - A)^ 



and then 



<(a3n^^)2 [l + bn + bl + Ubn + bl' 
since we can assume that 6„, Cn > 1- Now, 

and we can choose p„ G ("i^n, 1) by letting p„ := ^-^y^. We have 

p„ - = 1 - p„ = i(l - > = ia^n'^^T'- 

Finally, from Theorem [211 one obtains the bound 

\\P^{x,-)-ni.)\\y.<W{x)Ly^ 

where 



Pn ^ /I , 3eN/^ ^23eN ^ ^ ^26€ 



= (1 + 7n) ^ , < (1 + ain^^)(a6n^^^) < 
with 07 = (1 + ai)a6. This concludes the proof with H2 = 26 and C2 = ay. □ 

Appendix B. Some General Inequalities 
Theorem 22 (Birnbaum and Marshall). Let (X^)^^^ 5e random variables, such that 

E[\Xk\ I J^k-i] > ^k\Xk-i\, 
where J-'k '■= cr{Xi, . . . , X^), and ^/'fc > 0. Let > 0, and define 

bk ■■= max|afc,afc+iV'fc+i,---,ann"=A:+iV'i} 

for 1 < k < n, and bn+i '■= 0. If p > 1 is such that K\Xk\^ < 00 for all 1 < k < n, 
then 

P ( max a,|X,| > < ^^(fo^ - ^^+,6^+,)E|X,r. 

^ " ^ k=l 

Proof. [tI, Theorem 2.1]. □ 
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Corollary 23. Let (Mfc)^^^ he a martingale with respect to {J^k)'k=i- Let {ak)1=i he 
a strictly positive non-increasing sequence. If p > 1 is such that E|Mfc|^ < oo for all 
1 < k < n, then for 1 < m < n, 

P ( max ak\Mk\ > l) < <E|M„|J' + ^^(a^ - af+i)E|Mfc|f. 

k=m 

Proof. By Jensen's inequality, 

E[|Mfc| I Tk-i] > |E[Mfc I J^k-i]\ = \Mk-i\. 

Define ipk '■= 1 for 1 < k < n, and dk := for 1 < A; < m and dk '■= at for 
m < k < n. The result follows from Theorem [22l □ 



The following lemma is a conditional version of [SI, Lemma 3.3], and was stated 
also in [1', Lemma 10]. 

Lemma 24 (Dvoretzky). Let X he a square integrahle random variahle and Q a 
a-algehra on a prohahility space. Then, for every e > 0, 

E [(X - E [X I g] fl{\X-E[X I g]\>2e} I 6?] < 4E [XH{\x\>e} I G] . 

Proof Notice that 1{\x-e[x \ g]\>2e} < 1{\e[x \ g]\>e} + 1{|x|>£,|e[x | g]\<e}- We can esti- 
mate 

E[(X-E[X| g])%iE[x\g]\>e}\ Q] 

(45) =E[(X2-E[X| ^]2)l||K[x|g]|>.}| Q] 

< E [(X^ - V I e;] = E [(X2 - £')l{|x|>.} I Q] . 
Similarly, we obtain 

E[(X-E[X| Q]ft{\x\>e,nx\g]\<e}\ Q] 

(46) < E [(X^ + 2e\X\ + £')l{|x|>e,|E[x | g]\<e} \ Q] 

<¥.pX^ + e^)l^\x\>e} I Q] ■ 

Summing (j45j) and (l46l) concludes the proof. □ 

Appendix C. Contour Surface Containment 

Lemma 25. Suppose A dW^ is a smooth surface parameterised hy the unit sphere 
S'^, that is, A = {ug{u) : u G S'^} with a continuously differentiahle radial function 
g : S'^ (0, oo). Assume also that outer-pointing normal n of A satisfies n{x) ■ 
x/\\x\\ ^ /3 for all X d A with some constant /3 > 0. There is a constant M < oo 
depending only on /3 such that for any x,y & A, it holds that < ||a;||/||y|| < M. 



Proof. Consider first the two-dimensional case. Let x and y be two distinct points in 
A. We employ polar coordinates, thus let u{6)r{6) G A with u{6) := [cos(6'), sin(6')]"^ 
and r{e) := g{u{d)) so that M(^i)r(^i) = x and ^(^2)^(^2) = y with 61,62 G [0,2n). 

Let a{6) stand for the (smaller) angle between u{6) and the normal of the curve 
A, that is, the curve parametrized by — t- u{6)r{6). Our assumption says that 
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\a{t)\ < ao := arccos(/3) < 7r/2 for all 9 G [0, 27r]. On the other hand, an elementary 
computation shows that 



tan {a{9)) = 




and hence we have |^logr(^))| = \r'{9)/r{9)\ < tanag uniformly. We may estimate 
I log ||a;|| — log llylll < 27rtan(ao) yielding the claim with M = e^'^*'^""". 

For d >3, take the plane T containing the origin and the points x and y. This re- 
duces the situation to two dimensions, since AnT inherits the given normal condition 
of the surface and the radius vector. □ 



Eero Saksman, Department of Mathematics and Statistics, University of Helsinki, 
P.O.Box 68, FI-00014 University of Helsinki, Finland 
E-mail address: eero.saksman@helsiiiki.fi 

Matti Vihola, Department of Mathematics and Statistics, University of Jyvaskyla, 
P.O.Box 35 (MaD), FI-40014 University of Jyvaskyla, Finland 
E-mail address: matti.vihola@iki.fi 
URL: http://iki.fi/mvihola/ 



